Full Analysis of Philippine Food prices from 2007 - 2025
Aims:
Compare average prices across different regions for the same food item and year.
Examine how prices of a specific food item have changed over time within a region.
Identify regional price variations for different food items.
Predict future food prices based on historical data.
Code
from itables import init_notebook_modeimport pandas as pdimport numpy as npfrom scipy.stats import shapirofrom scipy.stats import kruskalimport seaborn as snsimport statsmodels.api as smimport matplotlib.pyplot as pltimport scipy.stats as statsfrom scipy.stats import levenefrom ipywidgets import interactimport dashfrom dash import dcc, html, Input, Outputimport plotly.express as pximport plotly.graph_objects as goimport scikit_posthocs as spimport ipywidgets as widgetsfrom IPython.display import display, clear_outputfrom pmdarima import auto_arimaimport warningsimport plotly.io as pioimport shinyfrom shiny import ui, render, reactiveimport nest_asyncio
Code
init_notebook_mode(all_interactive=True)#open the file that was already cleaned, analyzed, and saved.file_path ="../Data/Processed/eda_results.csv"df = pd.read_csv(file_path)
Inspect the data
Code
# View the first few rows of the datasetdf.head()df
Region
Province
Food_Items
year
mean
median
Mode
Variance
Standard Deviation
IQR
Loading ITables v2.2.4 from the init_notebook_mode cell...
(need help?)
Code
# View the columns and data types of the datasetdf.info()
# View the summary statistics of the dataset though this may not make sense as they are already described in the EDA reportdf.describe()
year
mean
median
Mode
Variance
Standard Deviation
IQR
Loading ITables v2.2.4 from the init_notebook_mode cell...
(need help?)
Code
#check the shape of the datasetdf.shape
(91200, 10)
Code
# get the first and last readings for each food itemprint(df['year'].agg(['min', 'max']))
min 2007
max 2025
Name: year, dtype: int64
For comparing average prices across regions for the same food item and year, the closing price is generally the best price to use. Here’s why:
Consistency: The closing price is considered the most representative of the market’s consensus for that period. It accounts for the entire trading session and reflects both supply and demand dynamics over time.
Standard Usage: The closing price is the most widely used price in financial markets, meaning data and analysis are typically focused around this price.
Simplicity: It eliminates the noise created by intra-day fluctuations, focusing on the price at the end of the trading session, which is more relevant for long-term comparisons.
Code
# Filter the dataframe to keep only rows where 'food_item' starts with 'c_'df_filtered = df[df['Food_Items'].str.startswith('c_')]df_filtered
Region
Province
Food_Items
year
mean
median
Mode
Variance
Standard Deviation
IQR
Loading ITables v2.2.4 from the init_notebook_mode cell...
(need help?)
Loading ITables v2.2.4 from the init_notebook_mode cell...
(need help?)
Code
# Create figurefig = go.Figure()# Add traces for each food item and statfood_items = df_filtered['Food_Items'].unique()stats = ['mean', 'median']for food in food_items:for stat in stats: df_food = df_filtered[df_filtered['Food_Items'] == food]for region in df_food['Region'].unique(): df_region = df_food[df_food['Region'] == region] fig.add_trace(go.Scatter( x=df_region['year'], y=df_region[stat], mode='lines+markers', name=f"{food} - {region} ({stat})", visible=False ))# Set initial visible traces (first food item, 'mean' stat)initial_food = food_items[0]initial_stat ='mean'for trace in fig.data:if initial_food in trace.name and initial_stat in trace.name: trace.visible =True# Create separate dropdown buttons for food items and statisticsfood_buttons = []stat_buttons = []for food in food_items: visibility = [food in trace.name and initial_stat in trace.name for trace in fig.data] food_buttons.append({'label': food,'method': 'update','args': [{'visible': visibility}, {'title': f"{initial_stat.capitalize()} for {food} by Region and Year"}] })for stat in stats: visibility = [initial_food in trace.name and stat in trace.name for trace in fig.data] stat_buttons.append({'label': stat,'method': 'update','args': [{'visible': visibility}, {'yaxis': {'title': stat.capitalize()}}, {'title': f"{stat.capitalize()} for {initial_food} by Region and Year"}] })# Update layout with separate dropdown menusfig.update_layout( title=f"{initial_stat.capitalize()} for {initial_food} by Region and Year", xaxis_title="Year", yaxis_title=initial_stat.capitalize(), template="plotly_white", updatemenus=[ {'buttons': food_buttons,'direction': 'down','showactive': True,'x': 0,'xanchor': 'left','y': 1.15,'yanchor': 'top', }, {'buttons': stat_buttons,'direction': 'down','showactive': True,'x': 0.26,'xanchor': 'left','y': 1.15,'yanchor': 'top', } ])# Show figurefig.show()
The graph shows the mean (average) price or median price of different food items over time for different regions in the Philippines. #### Key Observation 1. Trend - Generally, the price of all food items has been increasing over the years across most regions. There are some fluctuations and periods of stability for some items, but the overall trend is upward. 2. Regional Differences - There is significant variation in price of different food items across regions. Some regions consistently have higher prices than others. Price fluctuations also varies between regions. Some regions experience more volatile prices than others.
Factors like weather patterns, government policies, agriculture changes, food transportation and global market trends can influence these prices.
To know which regions has the highest and lowest price every year for every food item, max, min will be calculated. Then range is next to know how wide the difference between the highest and lowest price.
Code
file_path ="../Data/Interim/cleaned_food_prices.csv"df_range = pd.read_csv(file_path)#dropping columns related to food price indexdf_nofpi_range = df_range.drop(columns=['o_food_price_index', 'h_food_price_index', 'l_food_price_index', 'c_food_price_index', 'inflation_food_price_index', 'trust_food_price_index'])# Convert 'Date' column to datetime formatdf_nofpi_range['Date'] = pd.to_datetime(df_nofpi_range['Date'])#dropping columns related to inflationdf_noinf_range = df_nofpi_range.drop(columns=['inflation_beans','inflation_cabbage', 'inflation_carrots', 'inflation_eggs', 'inflation_garlic', 'inflation_meat_beef_chops', 'inflation_meat_chicken_whole', 'inflation_meat_pork', 'inflation_onions', 'inflation_potatoes', 'inflation_rice', 'inflation_tomatoes'])#dropping columns related to trust scoresdf_cleaned_range = df_noinf_range.drop(columns=['trust_beans','trust_cabbage', 'trust_carrots', 'trust_eggs', 'trust_garlic', 'trust_meat_beef_chops', 'trust_meat_chicken_whole', 'trust_meat_pork', 'trust_onions', 'trust_potatoes', 'trust_rice', 'trust_tomatoes'])#dropping uneeded columnsdf_unneeded_range = df_cleaned_range.drop(columns=['country', 'City', 'lat', 'lon', 'Province', 'Date', 'month'])# Reshaping from wide to long format (including year and month as part of the identifier)df_range = df_unneeded_range.melt(id_vars=['Region', 'year'], var_name='Food_Items', value_name='Price')df_range_filtered = df_range[df_range['Food_Items'].str.startswith('c_')]df_range_filtered.loc[:, 'Food_Items'] = df_range_filtered['Food_Items'].str.replace('c_', '', regex=True)df_range_filtered
Region
year
Food_Items
Price
Loading ITables v2.2.4 from the init_notebook_mode cell...
(need help?)
Code
# Dropdown for selecting food itemsfood_items = df_range_filtered['Food_Items'].unique()dropdown = widgets.Dropdown(options=food_items, description="Food Item:")# Output widget for displaying the plotoutput = widgets.Output()def update_plot(selected_food_item):with output: clear_output(wait=True) # Ensure only one graph is displayed filtered_dfr = df_range_filtered[df_range_filtered['Food_Items'] == selected_food_item] price_stats = filtered_dfr.groupby('year')['Price'].agg(['min', 'max', 'mean']).reset_index() fig = go.Figure() fig.add_trace(go.Scatter(x=price_stats['year'], y=price_stats['min'], mode='lines+markers', name='Min Price')) fig.add_trace(go.Scatter(x=price_stats['year'], y=price_stats['max'], mode='lines+markers', name='Max Price')) fig.add_trace(go.Scatter(x=price_stats['year'], y=price_stats['mean'], mode='lines+markers', name='Average Price')) fig.update_layout( title=f'Min, Max, and Average Prices Per Year for {selected_food_item}', xaxis_title='Year', yaxis_title='Price', legend_title='Price Type' ) display(fig) # Display updated graphdef on_dropdown_change(change): update_plot(change['new'])# Attach event listenerdropdown.observe(on_dropdown_change, names='value')# Display UI elements oncedisplay(dropdown, output)# Show initial plotupdate_plot(food_items[0])
Code
# Dash app setupapph = dash.Dash(__name__)apph.layout = html.Div([ html.H3("Highest Prices Per Year by Region"),# Dropdown for selecting food items dcc.Dropdown(id='food-item-dropdown', options=[{'label': item, 'value': item} for item in df_range_filtered['Food_Items'].unique()], value=df_range_filtered['Food_Items'].unique()[0], placeholder="Select a food item" ),# Graph for displaying highest prices dcc.Graph(id='highest-price-graph')])# Callback to update the graph based on selected food item@apph.callback( Output('highest-price-graph', 'figure'), [Input('food-item-dropdown', 'value')])def update_graph(selected_food_item):# Filter data for the selected food item filtered_dfr = df_range_filtered[df_range_filtered['Food_Items'] == selected_food_item]# Find the highest price per year and the corresponding region highest_prices = ( filtered_dfr.loc[filtered_dfr.groupby('year')['Price'].idxmax()] .reset_index(drop=True) )# Create the bar chart fig = px.bar( highest_prices, x='year', y='Price', color='Region', # Highlight the region in the bar color title=f'Highest Prices Per Year for {selected_food_item}', labels={'Price': 'Price', 'year': 'year', 'Region': 'Region'} )return fig# Run the appif__name__=='__main__': apph.run_server(debug=True, port=8051, mode='inline', name="apph")
Code
# Dash app setupappl = dash.Dash(__name__)appl.layout = html.Div([ html.H3("lowest Prices Per Year by Region"),# Dropdown for selecting food items dcc.Dropdown(id='food-item-dropdown', options=[{'label': item, 'value': item} for item in df_range_filtered['Food_Items'].unique()], value=df_range_filtered['Food_Items'].unique()[0], placeholder="Select a food item" ),# Graph for displaying lowest prices dcc.Graph(id='lowest-price-graph')])# Callback to update the graph based on selected food item@appl.callback( Output('lowest-price-graph', 'figure'), [Input('food-item-dropdown', 'value')])def update_graph(selected_food_item):# Filter data for the selected food item filtered_dfr = df_range_filtered[df_range_filtered['Food_Items'] == selected_food_item]# Find the lowest price per year and the corresponding region lowest_prices = ( filtered_dfr.loc[filtered_dfr.groupby('year')['Price'].idxmin()] .reset_index(drop=True) )# Create the bar chart fig = px.bar( lowest_prices, x='year', y='Price', color='Region', # Highlight the region in the bar color title=f'Lowest Prices Per Year for {selected_food_item}', labels={'Price': 'Price', 'year': 'year', 'Region': 'Region'} )return fig# Run the app, changing port to make it have different URL and not interfere with other appif__name__=='__main__': appl.run_server(debug=True, port=8052, mode='inline', name="appl")
Code
# Dash App Setupapprange = dash.Dash(__name__)apprange.layout = html.Div([ html.H3("Price Range of Food Items per Year Across Regions"),# Dropdown for selecting food items dcc.Dropdown(id='food-item-dropdown', options=[{'label': item, 'value': item} for item in df_range_filtered['Food_Items'].unique()], value=df_range_filtered['Food_Items'].unique()[0], placeholder="Select a food item" ),# Graph for displaying the price range (min and max) per year dcc.Graph(id='price-range-graph')])# Callback to update the graph based on selected food item@apprange.callback( Output('price-range-graph', 'figure'), [Input('food-item-dropdown', 'value')])def update_graph(selected_food_item):# Filter data for the selected food item filtered_df = df_range_filtered[df_range_filtered['Food_Items'] == selected_food_item]# Get the minimum and maximum price per year price_range = ( filtered_df.groupby('year')['Price'] .agg(['min', 'max']) .reset_index() )# Create the graph (min and max prices as lines) fig = px.line( price_range, x='year', y=['min', 'max'], title=f'Price Range (Min & Max) Per Year for {selected_food_item} Across Regions', labels={'year': 'year', 'value': 'Price'}, line_shape='linear' )return fig# Run the appapprange.run_server(debug=True, port=8053, mode='inline', name="apprange")
Key Observations: 1. Overall Trend - Both the minimum and maximum prices show a general upward trend over the years, indicating an overall increase in the price of every food items. Some even recorded highest price increase on 2020, likely due to the COVID-19 pandemic and its impact on supply chains. 2. Price Range - The gap between the minimum and maximum prices for most items widens significantly in 2020 and 2021. This suggests that there were greater variations in prices across regions during these years.
Posssible causes: 1. COVID-19 Pandemic - The pandemic disrupted supply chains and increased demand for certain food items, which could have led to price volatility. 2. Regional Differences - Differences in the impact of the pandemic on regional economies and agricultural production could have contributed to price variations.
To truly know if the regions really have price difference over the years, hypothesis testing must be done. Please refer to .._testing.ipynb to know why Kruskall-Wallis was used here.
Code
#open the csv file that was already analyzed with kruskall wallis test.hypothesis_results ="../Data/Processed/hypothesis_testing_result.csv"hyp_df = pd.read_csv(hypothesis_results)
Code
# View the first few rows of the datasetprint(hyp_df.head())
Year Food_Item H-statistic p-value Result
0 2007 beans NaN NaN Not Significant
1 2007 c_beans 1077.368029 2.491117e-218 Significant
2 2007 c_cabbage 1105.762042 2.066749e-224 Significant
3 2007 c_carrots 1093.290901 9.693593e-222 Significant
4 2007 c_eggs 905.101619 1.726764e-181 Significant
Code
# Filter the dataframe to keep only rows where 'food_item' starts with 'c_'hyp_df_filtered = hyp_df[hyp_df['Food_Item'].str.startswith('c_')]print(hyp_df_filtered)
# or you can just count the number of unique Resultshyp_df_filtered["Result"].value_counts()
count
Result
Loading ITables v2.2.4 from the init_notebook_mode cell...
(need help?)
This means all the items for all the regions over the year have different prices. But which regions have different prices?
Code
# Define the available items and years for the dropdownsitems = df_range_filtered['Food_Items'].unique().tolist()years = df_range_filtered['year'].unique().tolist()# Step 1: Create the function that will update the graph based on selected item and yeardef update_heatmap(selected_item, selected_year):# Filter for the selected item and year filter_df = df_range_filtered[(df_range_filtered['Food_Items'] == selected_item) & (df_range_filtered['year'] == selected_year)]# Perform Dunn's Test to check which regions differ regions = filter_df['Region'].unique() dunn_result = sp.posthoc_dunn(filter_df, val_col='Price', group_col='Region')# Prepare heatmap data (p-values matrix) p_values_matrix = np.zeros((len(regions), len(regions)))for i, region1 inenumerate(regions):for j, region2 inenumerate(regions):if region1 != region2: p_value = dunn_result.loc[region1, region2] p_values_matrix[i, j] = p_valueelse: p_values_matrix[i, j] = np.nan # No need for comparison within the same region# Create a DataFrame for the heatmap using the regions as both rows and columns heatmap_df = pd.DataFrame(p_values_matrix, columns=regions, index=regions)# Replace NaN values with a large value (optional, to handle missing pairwise comparisons) heatmap_df = heatmap_df.fillna(1.0) # Optionally replace NaNs with 1 (indicating no comparison)# Custom colorscale: green for 1 and red for 0 custom_colorscale = [ [0, "darkred"], # 0 -> red [0.166, "red"], [0.332, "lightcoral"], [0.5, "yellow"], [0.66, "palegreen"], [0.832, "green"], [1, "darkgreen"] # 1 -> green ]# Create the heatmap figure with custom colorscale fig = go.Figure(data=go.Heatmap( z=heatmap_df.values, x=heatmap_df.columns, y=heatmap_df.index, colorscale=custom_colorscale, colorbar=dict(title='p-value'), zmin=0, zmax=1# Normalize the range from 0 to 1 ))# Update layout to improve axis labels and graph size fig.update_layout( title=f"Dunn's Test p-values for {selected_item} in {selected_year} by Region", xaxis_title='Region', yaxis_title='Region', autosize=False, # Disable autosize width=800, # Set width of the figure height=800, # Set height of the figure margin=dict(l=100, r=100, t=100, b=100), # Increase margins for better visibility xaxis=dict(tickmode='array', tickvals=heatmap_df.columns, ticktext=heatmap_df.columns, tickangle=45), # Rotate x-axis labels for better readability yaxis=dict(tickmode='array', tickvals=heatmap_df.index, ticktext=heatmap_df.index, tickangle=0) # Rotate y-axis labels for better readability )return fig# Step 2: Create the Dash appappDunns = dash.Dash(__name__)# Define the layout with dropdowns and the graphappDunns.layout = html.Div([ html.H3("Price Comparison Heatmap by Region"),# Dropdown for selecting item html.Label('Select Item:'), dcc.Dropdown(id='item-dropdown', options=[{'label': item, 'value': item} for item in items], value=items[0], # default value style={'width': '50%'} ),# Dropdown for selecting year html.Label('Select Year:'), dcc.Dropdown(id='year-dropdown', options=[{'label': year, 'value': year} for year in years], value=years[0], # default value style={'width': '50%'} ),# Graph for showing the heatmap dcc.Graph(id='heatmap-graph')])# Step 3: Create callback to update the heatmap based on dropdown selection@appDunns.callback( dash.dependencies.Output('heatmap-graph', 'figure'), [dash.dependencies.Input('item-dropdown', 'value'), dash.dependencies.Input('year-dropdown', 'value')])def update_graph(selected_item, selected_year):return update_heatmap(selected_item, selected_year)# Run the appif__name__=='__main__': appDunns.run_server(debug=True, port=8054, mode='inline', name="appDunns")
This heatmap visualizes the results of a Dunn’s post-hoc test for significant differences in prices of different food items between regions across the years. The color scale indicates the following: * Red (Close to 0): Indicates a statistically significant difference in bean prices between two regions. * Light Green to Yellow (Around 0.2 to 0.8): Suggests a possible difference, but not statistically significant based on the chosen significance level (typically 0.05) * Dark Green (Close to 0): Indicates no statistically significant difference in bean prices between the two regions.
Key Observations
Significant Differences Exist: The presence of so many red cells indicates that there are statistically significant differences in food prices in almost all regions at any given years.
No Significant Differences: patches of green indicate pairs of regions where the Dunn’s test did not find a statistically significant difference in food prices, but there are only few of them.
Market Average Comparison: The “Market Average” row/column allows you to see how each region’s prices compare to the overall average. Dark red cells in this row/column would indicate regions with prices significantly different from the market average.
Food price Prediction 3 years into the future
Code
#dropping uneeded columnsdf_unneeded_range = df_cleaned_range.drop(columns=['country', 'City', 'lat', 'lon', 'Province', 'year', 'month'])# Reshaping from wide to long format (including year and month as part of the identifier)df_range = df_unneeded_range.melt(id_vars=['Region', 'Date'], var_name='Food_Items', value_name='Price')df_range_filtered = df_range[df_range['Food_Items'].str.startswith('c_')]df_range_filtered.loc[:, 'Food_Items'] = df_range_filtered['Food_Items'].str.replace('c_', '', regex=True)# Sort by datedf = df_range_filtered.sort_values(by=['Region', 'Date'])
Code
# Suppress specific warningswarnings.filterwarnings("ignore", category=UserWarning, module="sklearn")warnings.filterwarnings("ignore", category=FutureWarning, module="sklearn")warnings.filterwarnings("ignore", category=FutureWarning, module="pmdarima")warnings.filterwarnings("ignore", category=DeprecationWarning, module="pmdarima")# Convert date column to datetimedf['Date'] = pd.to_datetime(df['Date'])# Sort by datedf = df.sort_values(by=['Region', 'Date'])# Dropdown for interactive visualizationregions = df['Region'].unique()items = df['Food_Items'].unique()region_dropdown = widgets.Dropdown(options=regions, description='Region:')item_dropdown = widgets.Dropdown(options=items, description='Food Item:')def preprocess_data(region, item): region_df = df[(df['Region'] == region) & (df['Food_Items'] == item)] region_df = ( region_df.groupby('Date', as_index=False)['Price'] .mean() # Aggregate duplicate dates by averaging ) region_df = region_df.set_index('Date').asfreq('MS') # Ensure monthly frequency region_df['Price'] = region_df['Price'].interpolate() # Fill missing valuesreturn region_dfdef update_graph(region, item): region_df = preprocess_data(region, item)iflen(region_df) <12:print(f"Not enough data for {region} - {item} to forecast.")return# Use AutoARIMA to find the best model model = auto_arima(region_df['Price'], seasonal=True, m=12, stepwise=True, trace=True, max_order=None, stationary=False)# Forecast the next 36 months forecast = model.predict(n_periods=36)# Plot the historical data and forecast plt.figure(figsize=(12, 6)) plt.plot(region_df.index, region_df['Price'], label='Historical Prices') plt.plot(pd.date_range(region_df.index[-1], periods=37, freq='MS')[1:], forecast, label='Forecast', linestyle='dashed', color='red') plt.xlabel('Date') plt.ylabel('Price') plt.legend() plt.title(f'Price Forecast for {region} - {item} (Next 36 Months)') plt.show()interactive_plot = widgets.interactive(update_graph, region=region_dropdown, item=item_dropdown)display(interactive_plot)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[223], line 30, in update_graph(selected_food_item='cabbage')
26 filtered_dfr = df_range_filtered[df_range_filtered['Food_Items'] == selected_food_item]
28 # Find the highest price per year and the corresponding region
29 highest_prices = (
---> 30 filtered_dfr.loc[filtered_dfr.groupby('year')['Price'].idxmax()]
highest_prices = Region year Food_Items Price
0 Cordillera Administrative region 2007 beans 95.74
1 Cordillera Administrative region 2008 beans 112.15
2 Cordillera Administrative region 2009 beans 117.54
3 Cordillera Administrative region 2010 beans 116.16
4 Cordillera Administrative region 2011 beans 108.70
5 Cordillera Administrative region 2012 beans 105.28
6 Cordillera Administrative region 2013 beans 108.55
7 Cordillera Administrative region 2014 beans 101.18
8 Cordillera Administrative region 2015 beans 106.72
9 Cordillera Administrative region 2016 beans 106.34
10 Cordillera Administrative region 2017 beans 116.21
11 Cordillera Administrative region 2018 beans 116.00
12 Cordillera Administrative region 2019 beans 109.47
13 Region VI 2020 beans 206.06
14 Region IV-B 2021 beans 151.75
15 Region IV-B 2022 beans 131.95
16 Region X 2023 beans 133.52
17 Cordillera Administrative region 2024 beans 130.00
18 Cordillera Administrative region 2025 beans 124.01
filtered_dfr = Region Date Food_Items Price
449407 Cordillera Administrative region 2007-01-01 cabbage 26.26
449408 Cordillera Administrative region 2007-02-01 cabbage 24.80
449409 Cordillera Administrative region 2007-03-01 cabbage 24.33
449410 Cordillera Administrative region 2007-04-01 cabbage 24.36
449411 Cordillera Administrative region 2007-05-01 cabbage 24.68
... ... ... ... ...
473055 Market Average 2024-09-01 cabbage 83.99
473056 Market Average 2024-10-01 cabbage 87.18
473057 Market Average 2024-11-01 cabbage 95.69
473058 Market Average 2024-12-01 cabbage 87.56
473059 Market Average 2025-01-01 cabbage 87.43
[23653 rows x 4 columns]
31 .reset_index(drop=True)
32 )
34 # Create the bar chart
35 fig = px.bar(
36 highest_prices,
37 x='year',
(...)
41 labels={'Price': 'Price', 'year': 'year', 'Region': 'Region'}
42 )
File c:\Users\Admin\Desktop\personal_projects\Data_Analysis\.venv\Lib\site-packages\pandas\core\frame.py:9183, in DataFrame.groupby(
self= Region D...01-01 cabbage 87.43
[23653 rows x 4 columns],
by='year',
axis=0,
level=None,
as_index=True,
sort=True,
group_keys=True,
observed=<no_default>,
dropna=True
)
9180 if level is None and by is None:
9181 raise TypeError("You have to supply one of 'by' and 'level'")
-> 9183 return DataFrameGroupBy(
self = Region Date Food_Items Price
449407 Cordillera Administrative region 2007-01-01 cabbage 26.26
449408 Cordillera Administrative region 2007-02-01 cabbage 24.80
449409 Cordillera Administrative region 2007-03-01 cabbage 24.33
449410 Cordillera Administrative region 2007-04-01 cabbage 24.36
449411 Cordillera Administrative region 2007-05-01 cabbage 24.68
... ... ... ... ...
473055 Market Average 2024-09-01 cabbage 83.99
473056 Market Average 2024-10-01 cabbage 87.18
473057 Market Average 2024-11-01 cabbage 95.69
473058 Market Average 2024-12-01 cabbage 87.56
473059 Market Average 2025-01-01 cabbage 87.43
[23653 rows x 4 columns]
by = 'year'
axis = 0
level = None
as_index = True
sort = True
group_keys = True
observed = <no_default>
dropna = True
9184 obj=self,
9185 keys=by,
9186 axis=axis,
9187 level=level,
9188 as_index=as_index,
9189 sort=sort,
9190 group_keys=group_keys,
9191 observed=observed,
9192 dropna=dropna,
9193 )
File c:\Users\Admin\Desktop\personal_projects\Data_Analysis\.venv\Lib\site-packages\pandas\core\groupby\groupby.py:1329, in GroupBy.__init__(
self=<pandas.core.groupby.generic.DataFrameGroupBy object>,
obj= Region D...01-01 cabbage 87.43
[23653 rows x 4 columns],
keys='year',
axis=0,
level=None,
grouper=None,
exclusions=None,
selection=None,
as_index=True,
sort=True,
group_keys=True,
observed=<no_default>,
dropna=True
)
1326 self.dropna = dropna
1328 if grouper is None:
-> 1329 grouper, exclusions, obj = get_grouper(
obj = Region Date Food_Items Price
449407 Cordillera Administrative region 2007-01-01 cabbage 26.26
449408 Cordillera Administrative region 2007-02-01 cabbage 24.80
449409 Cordillera Administrative region 2007-03-01 cabbage 24.33
449410 Cordillera Administrative region 2007-04-01 cabbage 24.36
449411 Cordillera Administrative region 2007-05-01 cabbage 24.68
... ... ... ... ...
473055 Market Average 2024-09-01 cabbage 83.99
473056 Market Average 2024-10-01 cabbage 87.18
473057 Market Average 2024-11-01 cabbage 95.69
473058 Market Average 2024-12-01 cabbage 87.56
473059 Market Average 2025-01-01 cabbage 87.43
[23653 rows x 4 columns]
grouper = None
grouper, exclusions, obj = (None, None, Region Date Food_Items Price
449407 Cordillera Administrative region 2007-01-01 cabbage 26.26
449408 Cordillera Administrative region 2007-02-01 cabbage 24.80
449409 Cordillera Administrative region 2007-03-01 cabbage 24.33
449410 Cordillera Administrative region 2007-04-01 cabbage 24.36
449411 Cordillera Administrative region 2007-05-01 cabbage 24.68
... ... ... ... ...
473055 Market Average 2024-09-01 cabbage 83.99
473056 Market Average 2024-10-01 cabbage 87.18
473057 Market Average 2024-11-01 cabbage 95.69
473058 Market Average 2024-12-01 cabbage 87.56
473059 Market Average 2025-01-01 cabbage 87.43
[23653 rows x 4 columns])
exclusions = None
keys = 'year'
axis = 0
level = None
sort = True
lib.no_default = <no_default>
observed is lib.no_default = True
observed = <no_default>
lib = <module 'pandas._libs.lib' from 'c:\\Users\\Admin\\Desktop\\personal_projects\\Data_Analysis\\.venv\\Lib\\site-packages\\pandas\\_libs\\lib.cp312-win_amd64.pyd'>
self.dropna = True
self = <pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000024004023A10>
1330 obj,
1331 keys,
1332 axis=axis,
1333 level=level,
1334 sort=sort,
1335 observed=False if observed is lib.no_default else observed,
1336 dropna=self.dropna,
1337 )
1339 if observed is lib.no_default:
1340 if any(ping._passed_categorical for ping in grouper.groupings):
File c:\Users\Admin\Desktop\personal_projects\Data_Analysis\.venv\Lib\site-packages\pandas\core\groupby\grouper.py:1043, in get_grouper(
obj= Region D...01-01 cabbage 87.43
[23653 rows x 4 columns],
key='year',
axis=0,
level=None,
sort=True,
observed=False,
validate=True,
dropna=True
)
1041 in_axis, level, gpr = False, gpr, None
1042 else:
-> 1043 raise KeyError(gpr)
gpr = 'year'
1044 elif isinstance(gpr, Grouper) and gpr.key is not None:
1045 # Add key to exclusions
1046 exclusions.add(gpr.key)
KeyError: 'year'
Key Observations
Historical Price Volatility: The blue line representing historical prices shows significant fluctuations, especially noticeable spikes around 2008 and smaller ones in other years. This indicates that most food prices have been subject to considerable volatility over the past two decades.
Recent Price Increase: There’s a clear upward trend in food prices in the period leading up to the forecast (roughly from 2020 to 2024). This suggests recent factors have been pushing prices higher.
Relatively Stable Forecast: The red dashed line, representing the forecast for the next 36 months, shows a relatively stable and slightly upward trend. The model predicts a continued but moderated increase compared to the recent past.
Key Observations
Historical Volatility:
Weather Patterns:adverse weather events in key agricultural regions can significantly impact supply and cause price volatility.
Changes in Demand: Fluctuations in global demand, driven by population growth or economic factors, can also contribute to price swings.
Recent Price Increase:
Inflation: General inflation in the economy could be pushing up agricultural commodity prices.
Supply Chain Issues: Disruptions to global supply chains (e.g., due to pandemics, conflicts, or trade policies) can lead to shortages and price increases.
Increased Input Costs: Rising costs of fertilizers, fuel, and labor can increase the cost of food production, which could be passed on to consumers.
Relatively Stable Forecast:
Model Assumption: ARIMA models assume that, to some extent, future trends will resemble past patterns. The stable forecast suggests the model has not picked up any strong upward or downward pressures for the next 36 months beyond what is already captured in the recent trend.